Analysis of imbalanced data set problem: The case of churn prediction for telecommunication
نویسنده
چکیده
Class-imbalanced datasets are common in the field of mobile Internet industry. We tested three kinds of feature selection techniques-Random Forest (RF), Relative Weight (RW) and Standardized Regression Coefficients (SRC); three kinds of balance methods-over-sampling (OS), under-sampling (US) and synthetic minority over-sampling (SMOTE); a widely used classification method-RF. The combined models are composed of feature selection techniques, balancing techniques and classification method. The original dataset which has 45 thousand records and 22 features were used to evaluate the performances of both feature selection and balancing techniques. The experimental results revealed that SRC combined with SMOTE technique attained the minimum value of Cost = 1085. Through the calculation of the Cost on all models, the most important features for minimum cost of telecommunication were identified. The application of these combined models will have the possibility to maximize the profit with the minimum expenditure for customer retention and help reduce customer churn rates.
منابع مشابه
Enhancing the Performance of the Classifiers for Customer Churn Analysis in Telecommunication Data using EMOTE
Customer Churn is the term refers to the customers who are in threat to leave the company. Growing number of such customers are becoming critical for the telecommunication sector and the telecom sector are also in a situation to retain them to avoid the revenue loss. Prediction of such behaviour is very essential for the telecom sector and Classifiers proved to be the effective one for the same...
متن کاملChurn prediction in telecom using Random Forest and PSO based data balancing in combination with various feature selection strategies
The telecommunication industry faces fierce competition to retain customers, and therefore requires an efficient churn prediction model to monitor the customer’s churn. Enormous size, high dimensionality and imbalanced nature of telecommunication datasets are main hurdles in attaining the desired performance for churn prediction. In this study, we investigate the significance of a Particle Swar...
متن کاملSocial Network Analysis for Churn Prediction in Telecom Data
Social Network Analysis (SNA) is a set of research procedures for identifying group of people who share common structures in systems based on the relations among actors. Grounded in graph and system theories, this approach has proven to be powerful measures for studying networks in various industries like Telecommunication, banking, physics and social world, including on the web. Since Telecomm...
متن کاملNeighborhood Cleaning Rules and Particle Swarm Optimization for Predicting Customer Churn Behavior in Telecom Industry
Churn prediction is an important task for Customer Relationship Management (CRM) in telecommunication companies. Accurate churn prediction helps CRM in planning effective strategies to retain their valuable customers. However, churn prediction is a complex and challenging task. In this paper, a hybrid churn prediction model is proposed based on combining two approaches; Neighborhood Cleaning Ru...
متن کاملHierarchical Alpha-cut Fuzzy C-means, Fuzzy ARTMAP and Cox Regression Model for Customer Churn Prediction
As customers are the main asset of any organization, customer churn management is becoming a major task for organizations to retain their valuable customers. In the previous studies, the applicability and efficiency of hierarchical data mining techniques for churn prediction by combining two or more techniques have been proved to provide better performances than many single techniques over a nu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Artif. Intell. Research
دوره 6 شماره
صفحات -
تاریخ انتشار 2017